Exploiting Lexical Dependencies from Large-Scale Data for Better Shift-Reduce Constituency Parsing

نویسندگان

Muhua Zhu

Jingbo Zhu

Huizhen Wang

چکیده

This paper proposes a method to improve shift-reduce constituency parsing by using lexical dependencies. The lexical dependency information is obtained from a large amount of auto-parsed data that is generated by a baseline shift-reduce parser on unlabeled data. We then incorporate a set of novel features defined on this information into the shift-reduce parsing model. The features can help to disambiguate action conflicts during decoding. Experimental results show that the new features achieve absolute improvements over a strong baseline by 0.9% and 1.1% on English and Chinese respectively. Moreover, the improved parser outperforms all previously reported shift-reduce constituency parsers. Title and Abstract in Chinese 利用大规模数据词汇依存关系改进移进-归约成分句法分析本文提出了一种利用词汇依存关系改进移进-归约成分句法分析的方法。首先,我们利用基准系统在大规模无标注数据上进行自动句法分析并从分析结果中抽取词汇依存关系。其后,我们在词汇依存信息的基础上定义了一组新特征并将这些特征整合到移进-归约句法分析模型中。新特征用于帮助消除移进-归约过程中的动作歧义。实验结果表明,新特征在英文和中文数据上分别取得了0.9% 和1.1%的性能改进。最终得到的句法分析器的性能优于相关研究工作中所报告的移进-归约句法分析器的性能。

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving shift-reduce constituency parsing with large-scale unlabeled data

Shift-reduce parsing has been studied extensively for diverse grammars due to the simplicity and running efficiency. However, in the field of constituency parsing, shift-reduce parsers lag behind state-of-the-art parsers. In this paper we propose a semi-supervised approach for advancing shift-reduce constituency parsing. First, we apply the uptraining approach (Petrov, S. et al. 2010. In Procee...

متن کامل

Decreasing Lexical Data Sparsity in Statistical Syntactic Parsing - Experiments with Named Entities

In this paper we present preliminary experiments that aim to reduce lexical data sparsity in statistical parsing by exploiting information about named entities. Words in the WSJ corpus are mapped to named entity clusters and a latent variable constituency parser is trained and tested on the transformed corpus. We explore two different methods for mapping words to entities, and look at the effec...

متن کامل

Partial Training for a Lexicalized-Grammar Parser

We propose a solution to the annotation bottleneck for statistical parsing, by exploiting the lexicalized nature of Combinatory Categorial Grammar (CCG). The parsing model uses predicate-argument dependencies for training, which are derived from sequences of CCG lexical categories rather than full derivations. A simple method is used for extracting dependencies from lexical category sequences, ...

متن کامل

Discontinuous parsing with continuous trees

We introduce a new method for incremental shift-reduce parsing of discontinuous constituency trees, based on the fact that discontinuous trees can be transformed into continuous trees by changing the order of the terminal nodes. It allows for a clean formulation of different oracles, leads to faster parsers and provides better results. Our best system achieves an F1 of 80.02 on TIGER.

متن کامل

Uptraining for Accurate Deterministic Question Parsing

It is well known that parsing accuracies drop significantly on out-of-domain data. What is less known is that some parsers suffer more from domain shifts than others. We show that dependency parsers have more difficulty parsing questions than constituency parsers. In particular, deterministic shift-reduce dependency parsers, which are of highest interest for practical applications because of th...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2012

Exploiting Lexical Dependencies from Large-Scale Data for Better Shift-Reduce Constituency Parsing

نویسندگان

چکیده

منابع مشابه

Improving shift-reduce constituency parsing with large-scale unlabeled data

Decreasing Lexical Data Sparsity in Statistical Syntactic Parsing - Experiments with Named Entities

Partial Training for a Lexicalized-Grammar Parser

Discontinuous parsing with continuous trees

Uptraining for Accurate Deterministic Question Parsing

عنوان ژورنال:

اشتراک گذاری